Parallel computation of phylogenetic consensus trees
نویسندگان
چکیده
The field of bioinformatics is witnessing a rapid and overwhelming accumulation of molecular sequence data, predominantly driven by novel wet-lab sequencing techniques. This trend poses scalability challenges for tool developers. In the field of phylogenetic inference (reconstruction of evolutionary trees from molecular sequence data), scalability is becoming an increasingly important issue for operations other than the tree reconstruction itself. In this paper we focus on a post-analysis task in reconstructing very large trees, specifically the step of building (extended) majority rules consensus trees from a collection of equally plausible trees or a collection of bootstrap replicate trees. To this end, we present sequential optimizations that establish our implementation as the current fastest exact implementation in phylogenetics, and our novel parallelized routines are the first of their kind. Our sequential optimizations achieve a performance improvement of factor 50 compared to the previous version of our code and we achieve a maximum speedup of 5.5 on a 8-core Nehalem node for building consensi on trees comprising up to 55,000 organisms. The methods developed here are integrated into the widely used open-source tool RAxML for phylogenetic tree
منابع مشابه
A fully resolved consensus between fully resolved phylogenetic trees.
Nowadays, there are many phylogeny reconstruction methods, each with advantages and disadvantages. We explored the advantages of each method, putting together the common parts of trees constructed by several methods, by means of a consensus computation. A number of phylogenetic consensus methods are already known. Unfortunately, there is also a taboo concerning consensus methods, because most b...
متن کاملThe transposition distance for phylogenetic trees
The search for similarity and dissimilarity measures on phylogenetic trees has been motivated by the computation of consensus trees, the search by similarity in phylogenetic databases, and the assessment of clustering results in bioinformatics. The transposition distance for fully resolved phylogenetic trees is a recent addition to the extensive collection of available metrics for comparing phy...
متن کاملReconstruction of Maximum Likelihood Phylogenetic Trees in Parallel Environment Using Logic Programming
With rapid increase of nucleotide and amino acid sequence data, it is required to develop reliable and exible application programs to infer molecular phylogenetic trees. The maximum likelihood method is known to be robust among many methods for reconstruction of molecular phylogenetic trees, however, this method requires extremely high computational cost. Although parallel computation is a good...
متن کاملPoint estimates in phylogenetic reconstructions
MOTIVATION The construction of statistics for summarizing posterior samples returned by a Bayesian phylogenetic study has so far been hindered by the poor geometric insights available into the space of phylogenetic trees, and ad hoc methods such as the derivation of a consensus tree makeup for the ill-definition of the usual concepts of posterior mean, while bootstrap methods mitigate the absen...
متن کاملDistributed and parallel algorithms and systems for inference of huge phylogenetic trees based on the maximum likelihood method
The computation of large phylogenetic (evolutionary) trees from DNA sequence data based on the maximum likelihood criterion is most probably NP-complete. Furthermore, the computation of the likelihood value for one single potential tree topology is computationally intensive. This thesis introduces a number of algorithmic and technical solutions which for the first time enable parallel inference...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010